The Impact of the COVID-19 Pandemic on Mental Health
STA141B Final Project by Bryan Kim

Introduction

Our final project was exploring the effects of the COVID-19 pandemic on mental health, specifically in the United States. We will attempt to answer the questions:

This question is meaningful to everyone in the United States as many people have been negatively impacted by the pandemic and would benefit from increased focus on mental health in research and policy related to recovery from the pandemic. We hope that our findings will help advance knowledge about Americans’ vulnerability to negative mental health outcomes so that susceptible groups can be prioritized in government response and assistance programs related to COVID-19 or future pandemics.

Our team will use a time series data set from the U.S. Census Bureau’s Household Pulse Survey (HPS) to describe indicators of mental health outcomes associated with the COVID-19 pandemic. The HPS was developed to help understand the social and economic impacts of COVID-19 on American households. The time series data set we will use contains measures of anxiety and depression across various demographics in the United States. We will also relate the HPS data to data from the U.S. Census Bureau’s American Community Survey (ACS) to further analyze the proportion of people experiencing a change in mental health. The ACS is one of the most comprehensive sources of population information about the United States, so we will use ACS data to identify other social, economic, housing, or demographic factors associated with changes in mental health during the pandemic. In addition, our team will use web scraping on various social media platforms to see if there is an increase in the frequency of posts regarding anxiety, depression, or loneliness since the start of the pandemic.

Our research will involve investigating several parts:

Part 1: Analyzing Census Data by Demographics

Part 2: Visualizing Change in Mental Health Over Time

Part 3: Webscraping Reddit


Part 1: Analyzing Census Data by Demographics

Methodology:

We will join together the Household Pulse Survey data and the American Community Survey data to determine whether economic and social factors are associated with mental health outcomes (anxiety and depression during the pandemic).

The most recent ACS data was collected in 2020. According to the US Census Bureau, the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. Therefore, we will only look at median household income and access to internet in 2020. Since we are using statewide 2020 ACS data, we can only join it with statewide HPS data collected in 2020.

We will analyze the data by comparing disparities in indicator values for anxiety or depression between certain demographics. In particular, we will investigate how differences in age, ethnicity, education, states, access to computers and internet, and median household income may affect indicator values for anxiety or depression.

Step 1: Tidy the data sets

Step 2: Join the data sets

Step 3: Visualizations & Analysis

We will begin visualizing results from the joined data set based on age, ethnicity, and education level.

According to the barplot above, age seems to correlate to the value—the indicator of level of depression or anxiety an individual has. The youngest age bin seems to have the highest mean value while the oldest age bin has the lowest mean value. The mean value decreases as the age bin increases.

According to the barplot above, Non-Hispanic Asians have the lowest mean value while Non-Hispanic (other races and multiple races) have the highest mean value.

According to the barplot above, it seems that a lower level of education correlates to higher values of indicators of depression or anxiety. Interestingly, it seems that those that have some college or Associate's degree have a higher mean value than those with a high school diploma or GED. However, those with less than a high school diploma have the highest mean indicator value overall and those with a Bachelor's degree or higher have the lowest mean indicator value overall.

The highest percentage of respondents who reported symptoms of anxiety or depression typically came from southern and western states.

According to the scatterplot above, the proportion of people with computer and internet access seems to have a negative relationship with the indicator value for depression or anxiety. This means that a higher proportion of people with computer and internet access should lead to a lower indicator value for depression and anxiety.

According to the scatterplot above, there seems to be no real correlation between median household income and the indicator value for depression or anxiety.

Step 4: Observations and Comments

Before analyzing our data set, we would expect that certain demographics would have higher indicator values for depression or anxiety. For example, we hypothesized that older age groups would experience higher indicator values for depression or anxiety as they are the most susceptible to COVID-19. We also hypothesized that Asian populations would experience higher levels of depression or anxiety due to a rise in anti-Asian sentiment following the COVID-19 outbreak. And we expected that those with higher median household incomes would experience far less levels of depression or anxiety than those with lower income.

After analyzing the data set, we found our expectations were defied. According to our visualizations, younger groups experienced the highest levels of depression or anxiety while the oldest groups experiences the lowest levels. We also found that Asian populations in fact had the lowest levels of depression or anxiety. Lastly, we found that median household income had little to no correlation with levels of depression or anxiety.

Part 2: Visualizing Change in Mental Health Over Time

Methodology:

In order to address one of our key questions "Has the COVID-19 pandemic increased mental health disparities in the United States?", we will create time series graphs of various demographics over the pandemic to investigate whether or not disparities among these demographics have increased. In particular, we will look at the disparities between age, ethnicity, and education over the pandemic from April 2020 to December 2020. The time period corresponds to the number of weeks starting on April 23, 2020.

We will investigate the biggest disparities that we've found in part 1. That is: ages 18-29 versus ages 80+, Non-Hispanic Asians versus Non-Hispanic (other races and multiple races), and less than a high school diploma versus bachelor's degree or higher.

According to the time series plots above, we see that ages 18 to 29 levels of depression or anxiety have gone up while ages 80+ levels of depression or anxiety have gone down.

According to the time series plots above, levels of depression or anxiety for Non-Hispanic Asians have dropped significantly while levels for Non-Hispanic (other races and multiple races) have slightly decreased.

According to the time series plots above, we can see that levels of anxiety or depression for less than a high school diploma has dropped significantly while levels for bachelor's degree or higher has only slightly decreased.

Observations and Comments

While investigating disparities over time among demographics such as age, ethnicity, and education, we found that while groups varied in terms of change in levels of anxiety or depression, all demographics have experienced a large increase in levels during the 20th week of our time period which was followed by a steep drop around the 30th week of our time period.

We found that the disparities among our demographics for age and ethnicity have widened over the course of the pandemic, however the gap in levels of anxiety or depression between education has decreased

Part 3: Webscraping Reddit

Methodology:

In part 1, we found that the age demographic that experienced the highest levels of depression or anxiety were young adults aged 18-29. Since the main age demographic if Reddit users consists of young adults, we will utilize web scraping in anxiety and depression subreddits to determine whether young adults used Reddit as a platform to discuss mental health during the pandemic.

First we will investigate the frequency of the top 1000 reddit posts in anxiety and depression subreddits to see if there was an increase of posts during the pandemic. Then, we will create a word cloud of the comments from the posts to investigate common stressors people faced to see if there is an association between the pandemic and comments.

Part 1: Gathering the Top 1000 Posts

Part 2: Plotting the Frequency of Posts

According to the graph above, we see that there is a high frequency of posts starting at 2019.

Similarly to the graph of frequency of top 1000 posts on the depression subreddit, we see a high frequency of posts starting at 2019

According to the heat map, while many of the posts don't mention COVID-19 directly, there is still a high frequency of posts after 2019.

Similarly to the heat map of COVID-19 related posts on the depression subreddit, while many of the posts don't mention COVID-19 directly, there is still a high frequency of posts after 2019.

The graph above is a word cloud of the comments of the top 100 posts on the depression subreddit. We can see that "people", "time", and "work" appear in the cloud, meaning that they are mentioned very frequently.

The graph above is a word cloud of the comments of the top 100 posts on the anxiety subreddit. We can see that similar to the word cloud of the comments from the depression subreddit, "people", "time", and "work" appear in the cloud, meaning that they are mentioned very frequently.

Observations and Comments:

We hypothesized that there would be an increase in post frequency in depression and anxiety subreddits during the pandemic due to the fact that young adults make up most of Reddit's user demographic, and we found that young adults had the highest levels of depression and anxiety in Part 1.

In our frequency graph, we found that there was a high increase in post frequency starting 2019 which coincides with the start of the COVID-19 pandemic which started on December 2019. Furthermore, our heat maps revealed that while only a small fraction of posts directly mentioned COVID-19, almost all of the most popular posts were created during the pandemic.

We found that "people", "time", and "work" appeared in both word clouds of comments from the top 100 posts in anxiety and depression subreddits. This is significant because these topics were arguably heavily impacted by the pandemic

Conclusion:

The goal of our project was to investigate the impact of the COVID-19 pandemic on mental health in the United States. To accomplish this, we looked at various data sources including the Household Pulse Survey and American Community Survey from the U.S. Census Bureau along with the social media site "Reddit".

Before analyzing our data set, we hypothesized that certain demographics would experience higher levels of depression or anxiety throughout the pandemic. For example, we expected that older age groups would experience higher indicator values as they are the most susceptible to COVID-19, we expected that Asian populations would experience higher indicator values due to a rise in anti-Asian sentiment following the COVID-19 outbreak, and we expected that those with higher median household incomes would experience far less levels of depression or anxiety tha those with lower income.

In Part 1, we found that all of our hypotheses were wrong. It was actually the case that older age groups and Asian populations experienced the lowest levels of depression or anxiety. And we found that there was little to no correlation between household income and levels of depression or anxiety.

One of our key questions was "How has the progression of the COVID-19 pandemic affected the mental health of people in the United States?". In Part 2, we found that while groups experienced different changes in levels of anxiety or depression throughout the pandemic, all demographics experience a large spike in levels during the 20th week of our time period which was then followed by a steep drop on the 30th week. Furthermore, we noticed that after the entire time period, most people had lower levels of depression and anxiety compared to the first week.

Another one of our key questions was "Has the COVID-19 pandemic increased mental health disparities in the United States?" To answer this question, we looked at the biggest disparities in age, ethnicity, and education and plotted their levels of depression or anxiety over the pandemic. We ultimately found that disparities among age and ethnicity has widened over the course of the pandemic while the gap in levels of anxiety or depression between education has decreased.

After finding how young adults were most susceptible to experiencing high levels of depression or anxiety, we turned to Reddit to investigate whether young adults used Reddit as a platform to discuss mental health during the pandemic as young adults are the biggest user demographic of Reddit. In particular, we webscraped posts on depression and anxiety subreddits. We found that while only a small portion of the most popular posts were COVID-19 related, the vast majority of the posts were created during the pandemic. We created word clouds to see what was most commonly discussed within these posts and discovered that topics such as "people", "work", and "time" were frequently mentioned which were all severely impacted by the pandemic.

Sources:

https://data.cdc.gov/NCHS/Indicators-of-Anxiety-or-Depression-Based-on-Repor/8pt5-q6wp/data

https://www.census.gov/programs-surveys/acs/data/experimental-data/1-year.html

https://www.reddit.com/

https://www.oberlo.com/blog/reddit-statistics

Appendix